Search CORE

22 research outputs found

A methodology for the semiautomatic annotation of EPEC-RolSem, a basque corpus labeled at predicative level following the PropBank-Verb Net model

Author: Aldezabal Roteta Izaskun
Aranzabe Urruzola María Jesús
Díaz de Ilarraza Sánchez Arantza
Estarrona Ibarloza Ainara
Publication venue
Publication date: 01/01/2013
Field of study

In this article we describe the methodology developed for the semiautomatic annotation of EPEC-RolSem, a Basque corpus labeled at predicate level following the PropBank-VerbNet model. The methodology presented is the product of detailed theoretical study of the semantic nature of verbs in Basque and of their similarities and differences with verbs in other languages. As part of the proposed methodology, we are creating a Basque lexicon on the PropBank-VerbNet model that we have named the Basque Verb Index (BVI). Our work thus dovetails the general trend toward building lexicons from tagged corpora that is clear in work conducted for other languages. EPEC-RolSem and BVI are two important resources for the computational semantic processing of Basque; as far as the authors are aware, they are also the first resources of their kind developed for Basque. In addition, each entry in BVI is linked to the corresponding verb-entry in well-known resources like PropBank, VerbNet, WordNet, Levin’s Classification and FrameNet. We have also implemented several automatic processes to aid in creating and annotating the BVI, including processes designed to facilitate the task of manual annotation.Lan honetan, EPEC-RolSem corpusa etiketatzeko jarraitu dugun metodologia deskribatuko dugu. EPEC-RolSem corpusa PropBank-VerbNet ereduari jarraiki predikatu-mailan etiketatutako euskarazko corpusa da. Etiketatze-lana aurrera eramateko euskal aditzen izaera semantikoa aztertu eta ingeleseko aditzekin konparatu dugu, azterketa horren emaitza da lan honetan proposatzen dugun metodologia. Metodologiaren atal bat PropBank-VerbNet eredura sortutako euskal aditzen lexikoiaren osaketa izan da, lexikoi hau Basque Verb Index (BVI) deitu dugu. Gure lanak alor honetan beste hizkuntzetan dagoen joera nagusia jarraitzen du, hau da, etiketatutako corpusetatik lexikoiak sortzea. EPEC-RolSem eta BVI oso baliabide garrantzitsuak dira euskararen semantika konputazionalaren alorrean, izan ere, euskararako sortutako mota honetako lehen baliabideak dira. Honetaz guztiaz gain, BVIko sarrera bakoitza PropBank, VerbNet, WordNet, Levinen sailkapena eta FrameNet bezalako baliabide ezagunekin lotua dago. Hainbat prozesu automatiko inplementatu ditugu EPEC-RolSem corpusaren eskuzko etiketatzea laguntzeko eta baita BVI sortzeko eta osatzeko ere

Archivo Digital para la Docencia y la Investigación

HiTZ@Antidote: Argumentation-driven Explainable Artificial Intelligence for Digital Medicine

Author: Agerri Rodrigo
Alonso Iñigo
Atutxa Aitziber
Berrondo Ander
Estarrona Ainara
Garcia-Ferrero Iker
Goenaga Iakes
Gojenola Koldo
Oronoz Maite
Perez-Tejedor Igor
Rigau German
Yeginbergenova Anar
Publication venue
Publication date: 09/06/2023
Field of study

Providing high quality explanations for AI predictions based on machine learning is a challenging and complex task. To work well it requires, among other factors: selecting a proper level of generality/specificity of the explanation; considering assumptions about the familiarity of the explanation beneficiary with the AI task under consideration; referring to specific elements that have contributed to the decision; making use of additional knowledge (e.g. expert evidence) which might not be part of the prediction process; and providing evidence supporting negative hypothesis. Finally, the system needs to formulate the explanation in a clearly interpretable, and possibly convincing, way. Given these considerations, ANTIDOTE fosters an integrated vision of explainable AI, where low-level characteristics of the deep learning process are combined with higher level schemes proper of the human argumentation capacity. ANTIDOTE will exploit cross-disciplinary competences in deep learning and argumentation to support a broader and innovative view of explainable AI, where the need for high-quality explanations for clinical cases deliberation is critical. As a first result of the project, we publish the Antidote CasiMedicos dataset to facilitate research on explainable AI in general, and argumentation in the medical domain in particular.Comment: To appear: In SEPLN 2023: 39th International Conference of the Spanish Society for Natural Language Processin

arXiv.org e-Print Archive

Corpusen etiketatze linguistikoa

Author: Aldezabal Roteta Izaskun
Aranzabe Urruzola María Jesús
Díaz de Ilarraza Sánchez Arantza
Estarrona Ibarloza Ainara
Ezeiza Ramos Nerea
Uria Garin Larraitz
Publication venue: Servicio Editorial de la Universidad del País Vasco/Euskal Herriko Unibertsitatearen Argitalpen Zerbitzua
Publication date: 01/01/2009
Field of study

In this article, we shall comment on the steps that have to be taken to give a linguistic label to a corpus and the difficulties that appear in this process. Our main objective was to highlight the importance of the labelling when preparing a corpus that is useful for linguistic research, and the need to establish criteria and to take the decisions that this entails. We also explain how semi-automatic methods are applied and how the manual revision that guarantees the quality of the corpus is carried out. Once the corpus has been revised and labelled, it will be useful both for carrying out linguistic analyses and for improving or assessing the linguistic tools and resources, and also for channelling automatic study

Archivo Digital para la Docencia y la Investigación

Corpusen etiketatze linguistikoa

Author: Aldezabal Roteta Izaskun
Aranzabe Urruzola María Jesús
Díaz de Ilarraza Sánchez Arantza
Estarrona Ibarloza Ainara
Ezeiza Ramos Nerea
Uria Garin Larraitz
Publication venue: Servicio Editorial de la Universidad del País Vasco/Euskal Herriko Unibertsitatearen Argitalpen Zerbitzua
Publication date: 01/01/2009
Field of study

Archivo Digital para la Docencia y la Investigación

Universidad del País Vasco / Euskal Herriko Unibertsitatea: Ciencia - Portal de revistas digitales de la UPV/EHU

European language equality

Author: Aldabe Itziar
Arranz Victoria
Borg Claudia
Boytcheva Svetla
Choukri Khalid
Estarrona Ainara
Farwell Aritz
Heuschkel Maria
Jones Gareth
Kaltenböck Martin
Lieske Christian
Lynn Teresa
Marheinecke Katrin
Piperidis Stelios
Revenko Artem
Rigau German
Vandeghinste Vincent
Publication venue: Springer
Publication date: 01/01/2023
Field of study

This deep dive on data, knowledge graphs (KGs) and language resources (LRs) is the final of the four technology deep dives, as data as well as related models are the basis for technologies and solutions in the area of Language Technology (LT) for European digital language equality (DLE). This chapter focuses on the data and LRs required to achieve full DLE in Europe by 2030. The main components identified – data, KGs, LRs – are explained, and used to analyse the state-of-the-art as well as identify gaps. All of these components need to be tackled in the future, for the widest range of languages possible, from official EU languages to dialects to non- EU languages used in Europe. For all these languages, efficient data collection and sustainable data provision to be facilitated with fair conditions and costs. Specific technologies, methodologies and tools have been identified to enable the implementation of the vision of DLE by 2030. In addition, data-related business models and data-governance models are discussed, as they are considered a prerequisite for a working data economy that stimulates a vibrant LT landscape that can bring about European DLE.peer-reviewe

OAR@UM

Creación y Simulación de Metodologías de Análisis, Clasificación e Integración de Nuevos Requerimientos a Software Propietario

Author: Aduriz Itziar
Antoine Jean-Yves
Barbu Mititelu Verginica
Berk Gozde
Bhatia Archna
Candito Marie
Carlino Carola
Caruso Valeria
Chen Jia
Constant Matthieu
Cordeiro Silvio Ricardo
de Medeiros Caseli Helena
Di Buono Maria Pia
Ehren Rafael
Elyovitch Hevi
Erden Berna
Estarrona Ainara
Foster Jennifer
Fotopoulou Aggeliki
Foufi Vassiliki
Ge Xiaomin
Giouli Voula
Gonzalez Itziar
Guillaume Bruno
Gurrutxaga Antton
Güngör Tunga
Ha-Cohen Kerner Yaakov
Hu Fangyuan
Hu Sha
Ionescu Mihaela
Iñurrieta Uxoa
Jain Kanishka
Jiang Menghan
Li Minli
Lichte Timm
Liebeskind Chaya
Liu Siyuan
Louizou Sevasti
Lynn Teresa
Malka Ruth
Markantonatou Stella
Miranda Isaac
Monti Johanna
Onofrei Mihaela
Palka-Binkiewicz Emilia
Papadelli Stella
Parmentier Yannick
Pascucci Antonio
Pasquer Caroline
Puri Vandana
Qin Zhenzhen
Rademaker Alexandre
Raffone Annalisa
Ramisch Carlos
Ramisch Renata
Ramisch Renata
Ratori Shraddha
Riccio Anna
Rizea Monica-Mihaela
Sangati Federico
Savary Agata
Shukla Vishakha
Speranza Giulia
Srivastava Shubham
Stymme Sara
Stymne Sara
Sun Ruilong
Uria Larraitz
Urizar Ruben
Vaidya Ashwini
Vale Oto
Villavicencio Aline
Walsh Abigail
Wang Chenweng
Waszczuk Jakub
Wick Pedro Gabriela
Wilkens Rodrigo
Xiao Huangyang
Xu Hongzhi
Yan Peiyi
Yih Tsy
Yirmibeşoğlu Zeynep
Yu Ke
Yu Songping
Zeng Si
Zhang Yongchen
Zhao Yun
Zilio Leonardo
Publication venue
Publication date: 15/06/2016
Field of study

La priorización de nuevos requerimientos a implementar en un software propietario es un punto fundamental para su mantenimiento, la conservación de la calidad, observación de las reglas de negocio y los estándares de la empresa. Aunque existen herramientas de priorización basadas en técnicas probadas y reconocidas, las mismas requieren una calificación previa de cada requerimiento. Si la empresa cuenta con solicitudes provenientes de varios clientes de un mismo producto, aumentan los factores que afectan a la empresa, las herramientas disponibles no contemplan estos aspectos y hacen mucho más compleja la tarea de calificación. Este trabajo de investigación abarca la realización de un relevamiento de los métodos de priorización y selección de nuevos requerimientos utilizados por empresas de la zona de Rosario, y la definición de una metodología para la selección un nuevo requerimiento, que implica el análisis y evaluación de todas las implicaciones sobre el producto de software y la empresa, respetando sus reglas de negocio. La metodología creada conduce a la definición de los procesos para la construcción de una herramienta de calificación y priorización de nuevos requerimientos en software propietario que tiene solicitudes de varios clientes al mismo tiempo, con instrumentos de calificación que consideran todos los aspectos relacionados, proveerá técnicas de priorización actuales y emitirá informes personalizados según diferentes perspectivas de la empresa.Eje: Ingeniería de SoftwareRed de Universidades con Carreras en Informática (RedUNCI

LINDAT/CLARIN digital library at the Institute of Formal and Applied Linguistics (ÚFAL), Faculty of Mathematics and Physics, Charles University

Cálculo de distancia lingüística para textos históricos en euskera

Author: Estarrona Ibarloza Ainara
Etxeberria Uztarroz Izaskun
Padilla Manuel
Soraluze Ander
Publication venue: Sociedad Española para el Procesamiento del Lenguaje Natural
Publication date: 01/03/2023
Field of study

Measuring distance between languages, dialects and language varieties, both synchronically and diachronically, is a topic of growing interest in NLP. Based on our Syntactically Annotated Historical COrpus in BAsque (SAHCOBA) and previous work in perplexity-based language distance proposed by Gamallo, Pichel and Alegria (2017, 2020), we have compared historical corpora with current texts in the standard variety and calculated the language distances between them. As the standard Basque is based on the central dialects, the starting hypothesis is that the oldest texts and the dialects on the extremes will be the most distant. The results obtained have largely confirmed the thesis of traditional dialectology: peripheral dialects show a strong idiosyncrasy and are more distant from the rest.Medir la distancia entre diferentes lenguas, dialectos o variantes de lengua, tanto sincrónica como diacrónicamente, es un área de interés creciente dentro del PLN. Basándonos en el corpus histórico sintácticamente anotado del euskera (SAHCOBA), y en el trabajo previo realizado por Gamallo, Pichel y Alegría (2017, 2020) en relación con la distancia entre lenguas basada en perplejidad, hemos comparado textos históricos en euskera con textos actuales y hemos calculado la distancia entre ellos. Dado que el euskera estándar se basa en los dialectos centrales, la hipótesis inicial es que los textos más antiguos, así como los textos de los dialectos periféricos serán los más distantes. Los resultados obtenidos confirman de forma contundente las tesis propuestas por la dialectología tradicional: los dialectos periféricos muestran una fuerte idiosincrasia y su distancia respecto al estándar es mayor que la del resto de dialectos.This research has been partially supported by the Agence nationale de la recherche of France (ANR-17-CE27-572 0011-BIM); the Ministry of Science, Innovation, and Universities of Spain (RTI2018-098082-J-I00); and the Basque Government (IT1570-22)

Repositorio Institucional de la Universidad de Alicante

The First Annotated Corpus of Historical Basque

Author: ESTARRONA Ainara
ETXEBERRIA Izaskun
ETXEPARE Ricardo
PADILLA-MOYANO Manuel
SORALUZE Ander
Publication venue: 'Oxford University Press (OUP)'
Publication date: 19/10/2021
Field of study

Abstract This article presents the elaboration of a morphosyntactically annotated diachronic corpus of Basque, and the first results obtained in the processing of historical varieties of this language with computational techniques. The corpus size is around one million words, expanding from the 15th to the mid-18th century and encompassing the most significant written production in all historical dialects. Morphosyntactic tagging allows for systematic searches at different levels of complexity; additionally, a rich set of metadata enables searches based on sociohistorical criteria too. This is not only the first tagged corpus of historical Basque but also a means to improve language processing tools by analyzing historical varieties more or less distant from the present-day standard language. Moreover, this project aims to set a model for further works in the historical corpora of Basque and inform similar projects on other languages

Oskar Bordeaux

Normalización de Textos Históricos Vascos

Author: Estarrona Ibarloza Ainara
Etxeberria Uztarroz Izaskun
Padilla Manuel
Soraluze Ander
Publication venue: Sociedad Española para el Procesamiento del Lenguaje Natural
Publication date: 01/01/2019
Field of study

This paper presents a computational method and its evaluation in a real scenario with the aim of normalising Basque historical texts in order to be analysed using standard Natural Language Processing tools (NLP). This normalisation work is part of a more general ongoing project called Basque in the Making (BIM): A Historical Look at a European Language Isolate, whose main objective is the systematic and diachronic study of a number of grammatical features of the Basque language.En este artículo se presenta y evalúa en un entorno real un método computacional con el objetivo de normalizar textos históricos vascos para que, una vez normalizados, puedan ser analizados con herramientas estándar de Procesamiento del Lenguaje Natural (PLN). Este trabajo de normalización forma parte de un proyecto en curso más general llamado Basque in the Making (BIM): A Historical Look at a European Language Isolate, cuyo objetivo principal es el estudio sistemático y diacrónico de ciertas características gramaticales de la lengua vasca.The research leading to these results was carried out as part of the BIM project (Agence Nationale de la Recherche, France) and the BERBAOLA project (Basque Government funding, Elka-rtek KK-2017/00043)

Repositorio Institucional de la Universidad de Alicante

LAReferencia - Red Federada de Repositorios Institucionales de Publicaciones Científicas Latinoamericanas

Oskar Bordeaux

Building a Syntactically Annotated Historical Corpus of Basque

Author: ESTARRONA Ainara
ETXEBERRIA Izaskun
ETXEPARE Ricardo
PADILLA MOYANO Manuel
SORALUZE Ander
Publication venue: Institución Príncipe de Viana - Gobierno de Navarra
Publication date: 01/01/2020
Field of study

International audienc

ArtXiker - @HAL

Oskar Bordeaux